[ENH]: parallel implementation of `is_reachable()` with numpy arrays. #119

akshitasure12 · 2025-06-09T20:01:54Z

adding a numpy adjacency matrix based representation of is_reachable() yielding better speedups (ref comment).
utilising shared memory to avoid passing G across multiple cores via joblib's memmapping.

Schefflera-Arboricola · 2025-06-10T03:03:06Z

something on the similar lines(in terms of enhancing tournament algorithms)-- we can also consider taking this reusing the pool of workers approach maybe for tournament_is_strongly_connected (in line 80) -- and that might show some more improvements in terms of speedups-- but all our functions are currently wrapped in a _configure_if_nx_active-- which is essentially a joblib.parallel_config context-- so this would require us to change how _configure_if_nx_active works currently.

PS: this doesn't have to be handled in this PR-- just something we can think about in the long run.

nx_parallel/algorithms/tournament.py

dschult · 2025-07-01T00:57:15Z

nx_parallel/algorithms/tournament.py

        return all(tnc)

-    def is_closed(G, nodes):
-        return all(v in G[u] for u in set(G) - nodes for v in nodes)
+    def is_closed(adjM, nodes):


Try checking the speed with and without this function. That is, inline the contents of the function into line 42 so Python can avoid a function call.

Another possible speed-up could be to put the simple checks first in the compound logical statement in line 42. That should avoid the is_closed code being run at all unless s in S and t not in S.

Do we know where this is taking most of its time? I was just guessing for these suggestions based on the for loop structure. Using the ipython tool %prun you might be able to tell where the code spends the most time -- but I don't know how that works with parallel joblib. Maybe try it with 1 cpu or with the parallel code part removed.

Try checking the speed with and without this function. That is, inline the contents of the function into line 42 so Python can avoid a function call.

I tried adding this logic to line 42, but it didn’t result in any noticeable speedups. So I’ve decided to keep it as a separate function for better readability. Pls let me know if you think otherwise.

Do we know where this is taking most of its time? I was just guessing for these suggestions based on the for loop structure. Using the ipython tool %prun you might be able to tell where the code spends the most time -- but I don't know how that works with parallel joblib. Maybe try it with 1 cpu or with the parallel code part removed.

I removed the parallel execution and ran %prun using the following input:

G = nx.algorithms.tournament.random_tournament(1600) s, t = 0, 1599

A summary of the output:

ncalls tottime percall cumtime percall filename:lineno(function) 3/1 0.000 0.000 5.383 5.383 {built-in method builtins.exec} 1 0.000 0.000 5.383 5.383 <string>:1(<module>) 1 0.000 0.000 5.383 5.383 decorators.py:17(wrapper) 1 0.000 0.000 5.383 5.383 tournament.py:14(is_reachable) 1 0.728 0.728 4.324 4.324 tournament.py:30(two_neighborhood_close) 1279201 0.296 0.000 2.851 0.000 {built-in method builtins.any} 6404724 0.878 0.000 2.674 0.000 tournament.py:40(<genexpr>) 10230374 2.418 0.000 2.418 0.000 memmap.py:357(__getitem__) 1 0.000 0.000 1.056 1.056 decorators.py:783(func) 1 0.000 0.000 1.056 1.056 <class 'networkx.utils.decorators.argmap'> compilation 21:1(argmap_to_numpy_array_18) 1 0.000 0.000 1.042 1.042 backends.py:538(_call_if_any_backends_installed) 1 0.006 0.006 1.042 1.042 backends.py:1460(_call_with_backend) 1 0.484 0.484 1.035 1.035 convert_matrix.py:881(to_numpy_array)

It is evident from this that the bulk of time is spent computing two-hop neighbors and the memory access via memmap. I've tried other alternatives you mentioned, but this seems to perform the best.

akshitasure12 · 2025-07-18T14:56:00Z

Hi @Schefflera-Arboricola and @dschult, I'm trying to run asv benchmarks for is_reachable from my branch, but I keep running into a numpy import error. Can you help me out with this? I suspect it might be because NumPy isn't a default dependency. I tried adding it to pyproject.toml, but that didn't resolve the issue.

dschult · 2025-07-18T18:50:43Z

How is your environment set up? "conda/mamba" or "venv + pip"?
What script/incantation are you using? Can you give me enough info to recreate the issue locally? (I have your branch locally)

akshitasure12 · 2025-07-19T03:42:27Z

Apologies for the oversight! I was able to figure it out from the documentation. Thank you @dschult!

akshitasure12 · 2025-07-19T07:23:18Z

I tried out the following benchmarks:

Compared networkx's recent change to the numpy implementation of is_reachable() on this branch.
Results:

========== ============ ========== =========== ============ ============
--                                   num_nodes                          
---------- -------------------------------------------------------------
backend        50         100         200         400          800     
========== ============ ========== =========== ============ ============
parallel   6.19±0.4ms   14.5±2ms    44.7±1ms    166±1ms      655±2ms   
networkx    18.5±3ms    74.3±3ms   288±0.7ms   1.13±0.01s   4.53±0.03s 
========== ============ ========== =========== ============ ============

Compared networkx's recent change to a parallel version of the same (i.e, parallelised the pure Python Networkx implementation without Numpy)
Results:

========== ============ ========== ============ ========== ============
--                                  num_nodes                          
---------- ------------------------------------------------------------
backend        50         100         200         400         800     
========== ============ ========== ============ ========== ============
parallel   1.15±0.1ms   5.00±1ms   17.5±0.3ms   67.8±1ms   270±0.7ms  
networkx    18.3±1ms    73.7±1ms    289±3ms     1.12±0s    4.59±0.03s 
========== ============ ========== ============ ========== ============

The pure Python implementation performs better. I haven't committed the pure Python code yet because I wanted to confirm a few things first. Do these benchmarks imply that the process of converting a graph into a numpy matrix is more expensive than that of passing the huge graph to each core? Because n_jobs is None by default, for these benchmarks, I'm unsure how we are seeing the above speedups.

akshitasure12 · 2025-07-19T07:28:16Z

Pure Python parallelism after the recent Networkx commit, for reference:

def two_neighborhood_close(G, chunk):
        for v in chunk:
            v_adj = G._adj[v]
            S = {
                x
                for x, x_pred in G._pred.items()
                if x == v or x in v_adj or any(z in v_adj for z in x_pred)
            }
            if s in S and t not in S and is_closed(G, S):
                return True
        return False

    def is_closed(G, S):
        return all(u in S or all(v in unbrs for v in S) for u, unbrs in G._adj.items())

    if hasattr(G, "graph_object"):
        G = G.graph_object

    n_jobs = nxp.get_n_jobs()

    if get_chunks == "chunks":
        node_chunks = nxp.chunks(G, n_jobs)
    else:
        node_chunks = get_chunks(G)

    return not any(
        Parallel()(delayed(two_neighborhood_close)(G, chunk) for chunk in node_chunks)
    )

dschult · 2025-07-19T17:21:32Z

Do these benchmarks imply that the process of converting a graph into a numpy matrix is more expensive than that of passing the huge graph to each core? Because n_jobs is None by default, for these benchmarks, I'm unsure how we are seeing the above speedups.

Which version of config is used when n_jobs is None? That is, how many jobs are you actually running? The last benchmark with parallel w/ python vs sequential python looks like parallel is about 16 times faster. So that is either 16 cores or 4 cores with quadratic speed-up. I don't see any quadratic speedup looking at the code. But it is possible I am missing something.

Can you point me to your benchmarking code? Is it already merged in nx-parallel, or is it only on your local machine?

akshitasure12 · 2025-07-20T09:00:49Z

That is, how many jobs are you actually running?

I didn’t modify the number of jobs anywhere in the codebase, so by default, n_jobs=None is being used by joblib which would mean 1 core. I also verified this by checking the activity monitor while running the benchmark — the CPU usage confirmed that only a single core was active during the run.

Can you point me to your benchmarking code? Is it already merged in nx-parallel, or is it only on your local machine?

The local changes I made were:

replacing

 _ = nx.tournament.is_reachable(G, 1, num_nodes, backend=backend)

with:

_ = nx.tournament.is_reachable(G, 0, num_nodes - 1, backend=backend)

to prevent the target node from going out of bounds.

adding "networkx" to the list of backends in common.py.

akshitasure12 · 2025-07-28T18:30:09Z

I ran the timing script on both numpy implementation and the pure python implementation.
Numpy:

200
0.16524791717529297
0.0043315887451171875
Finished <function is_reachable at 0x1037ba980>
400
0.04875922203063965
0.017219066619873047
Finished <function is_reachable at 0x1037ba980>
800
0.1309349536895752
0.06830096244812012
Finished <function is_reachable at 0x1037ba980>
1600
0.4890291690826416
0.31351304054260254
Finished <function is_reachable at 0x1037ba980>
is_reachable

Python:

200
0.2303180694580078
0.004209995269775391
Finished <function is_reachable at 0x105c5a980>
400
0.056372880935668945
0.016617774963378906
Finished <function is_reachable at 0x105c5a980>
800
0.30538177490234375
0.06642794609069824
Finished <function is_reachable at 0x105c5a980>
1600
1.9619712829589844
0.3054389953613281
Finished <function is_reachable at 0x105c5a980>
is_reachable

These timings clearly show that the NumPy version outperforms the pure Python parallel version as the graph size grows. This is mainly because converting the graph to an adjacency matrix is a one-time cost, while passing the entire Graph object across multiple cores in the pure Python version repeatedly triggers expensive pickling and unpickling. The current implementation avoids this by storing the matrix on disk and allowing each worker to read it directly (using mem-mapping). I’m happy with this approach instead of converting it to pure Python parallel code, but please let me know if you have any other suggestions.

I believe this PR is ready if there aren’t any other specifics to address :)

akshitasure12 mentioned this pull request Jun 9, 2025

improve current timing script #114

Open

Schefflera-Arboricola added the type: Enhancement New feature or request label Jun 10, 2025

dschult reviewed Jul 1, 2025

View reviewed changes

akshitasure12 added 4 commits July 1, 2025 16:55

add a numpy implementation of is_reachable

75d92a8

added a label to node index mapping

a3f70af

add ordering

166b7ab

minor changes

3778190

akshitasure12 force-pushed the numpyImpl branch from 899a870 to 3778190 Compare July 1, 2025 15:26

minor optimisation

215bd13

attach updated heatmap

d707b87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ENH]: parallel implementation of `is_reachable()` with numpy arrays. #119

[ENH]: parallel implementation of `is_reachable()` with numpy arrays. #119

Uh oh!

akshitasure12 commented Jun 9, 2025

Uh oh!

Schefflera-Arboricola commented Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dschult Jul 1, 2025

Uh oh!

akshitasure12 Jul 1, 2025 •

edited

Loading

Uh oh!

akshitasure12 commented Jul 18, 2025 •

edited

Loading

Uh oh!

dschult commented Jul 18, 2025

Uh oh!

akshitasure12 commented Jul 19, 2025 •

edited

Loading

Uh oh!

akshitasure12 commented Jul 19, 2025

Uh oh!

akshitasure12 commented Jul 19, 2025

Uh oh!

dschult commented Jul 19, 2025

Uh oh!

akshitasure12 commented Jul 20, 2025

Uh oh!

akshitasure12 commented Jul 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[ENH]: parallel implementation of is_reachable() with numpy arrays. #119

Are you sure you want to change the base?

[ENH]: parallel implementation of is_reachable() with numpy arrays. #119

Uh oh!

Conversation

akshitasure12 commented Jun 9, 2025

Uh oh!

Schefflera-Arboricola commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschult Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

akshitasure12 Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akshitasure12 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschult commented Jul 18, 2025

Uh oh!

akshitasure12 commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshitasure12 commented Jul 19, 2025

Uh oh!

akshitasure12 commented Jul 19, 2025

Uh oh!

dschult commented Jul 19, 2025

Uh oh!

akshitasure12 commented Jul 20, 2025

Uh oh!

akshitasure12 commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

[ENH]: parallel implementation of `is_reachable()` with numpy arrays. #119

[ENH]: parallel implementation of `is_reachable()` with numpy arrays. #119

Schefflera-Arboricola commented Jun 10, 2025 •

edited

Loading

akshitasure12 Jul 1, 2025 •

edited

Loading

akshitasure12 commented Jul 18, 2025 •

edited

Loading

akshitasure12 commented Jul 19, 2025 •

edited

Loading

akshitasure12 commented Jul 28, 2025 •

edited

Loading